Refactor evaluation scripts: update dataset path, simplify batch size logic, and enhance CSV processing
